Skip to content

Conversation

@Aggarwal-Raghav
Copy link
Contributor

No description provided.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 8m 48s master passed
+1 💚 compile 0m 17s master passed
+1 💚 javadoc 0m 33s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 10s the patch passed
+1 💚 codespell 0m 28s No new issues.
+1 💚 compile 0m 8s the patch passed
+1 💚 javac 0m 8s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-1 ❌ hadolint 0m 1s /results-hadolint.txt The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 0m 7s the patch passed
_ Other Tests _
+1 💚 unit 0m 9s tez-dist in the patch passed.
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
12m 3s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/1/artifact/out/Dockerfile
GITHUB PR #456
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile shellcheck shelldocs hadolint
uname Linux f8674a739c90 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-456/src/.yetus/personality.sh
git revision master / ffceca5
Default Java Ubuntu-21.0.9+10-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/1/testReport/
Max. process+thread count 82 (vs. ulimit of 5500)
modules C: tez-dist U: tez-dist
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/1/console
versions git=2.43.0 maven=3.8.7 hadolint=1.18.0-0-g76eee5c codespell=2.4.1 shellcheck=0.7.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Jan 24, 2026

@abstractdog , I was able to start DagAppMaster with ZK on local. Attaching logs for the container docker_logs.txt

docker run -d \
        --name tez-am \
        -p 10001:10001 \
        -e TEZ_FRAMEWORK_MODE="STANDALONE_ZOOKEEPER" apache/tez-am:1.0.0-SNAPSHOT
brew install zookeeper
zkServer start

But this PR has lot of open items and I need some advice on the following:

  1. Is the docker directory inside tez-dist fine or should I create a sepate sub-module for dockerfile related code which will be executed after tez-dist module.
  2. This image will presumeably be ran with ZK + K8 + S3. Question is do we need a hadoop tarball inside this image just in case for some 3rd party jars etc. If my understanding is correct, it shouldn't be there but I've kept it for now. Will remove if you say so.
  3. in DAGAppMaster#main() there are lot of ENV variables which I have mocked for now in entrypoint.sh. I'll try to improve this (suggestions are welcomed here)
  4. my tez-site.xml is not getting picked up from classpath
    Configuration conf = new Configuration();
    . will debug that
  5. Any way/How to test this AM container without YARN by running some job?

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 shelldocs 0m 1s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 8m 52s master passed
+1 💚 compile 0m 16s master passed
+1 💚 javadoc 0m 35s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 9s the patch passed
+1 💚 codespell 0m 28s No new issues.
+1 💚 compile 0m 9s the patch passed
+1 💚 javac 0m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-1 ❌ hadolint 0m 0s /results-hadolint.txt The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1 💚 shellcheck 0m 0s No new issues.
+1 💚 javadoc 0m 6s the patch passed
_ Other Tests _
+1 💚 unit 0m 8s tez-dist in the patch passed.
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
12m 7s
Subsystem Report/Notes
Docker ClientAPI=1.52 ServerAPI=1.52 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/2/artifact/out/Dockerfile
GITHUB PR #456
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile shellcheck shelldocs hadolint
uname Linux ff0b8c85cf42 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-456/src/.yetus/personality.sh
git revision master / 856875a
Default Java Ubuntu-21.0.9+10-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/2/testReport/
Max. process+thread count 81 (vs. ulimit of 5500)
modules C: tez-dist U: tez-dist
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/2/console
versions git=2.43.0 maven=3.8.7 hadolint=1.18.0-0-g76eee5c codespell=2.4.1 shellcheck=0.7.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor

abstractdog commented Jan 26, 2026

@abstractdog , I was able to start DagAppMaster with ZK on local. Attaching logs for the container docker_logs.txt

docker run -d \
        --name tez-am \
        -p 10001:10001 \
        -e TEZ_FRAMEWORK_MODE="STANDALONE_ZOOKEEPER" apache/tez-am:1.0.0-SNAPSHOT
brew install zookeeper
zkServer start

But this PR has lot of open items and I need some advice on the following:

  1. Is the docker directory inside tez-dist fine or should I create a sepate sub-module for dockerfile related code which will be executed after tez-dist module.
  2. This image will presumeably be ran with ZK + K8 + S3. Question is do we need a hadoop tarball inside this image just in case for some 3rd party jars etc. If my understanding is correct, it shouldn't be there but I've kept it for now. Will remove if you say so.
  3. in DAGAppMaster#main() there are lot of ENV variables which I have mocked for now in entrypoint.sh. I'll try to improve this (suggestions are welcomed here)
  4. my tez-site.xml is not getting picked up from classpath
    Configuration conf = new Configuration();

    . will debug that
  5. Any way/How to test this AM container without YARN by running some job?

very good, very good, let me check this in detail sometime this week, here are some pointers in the meantime, responding your questions:

  1. I believe we can follow Apache Hive in this area, feel free to do something like here: https://github.com/apache/hive/tree/master/packaging

  2. We should keep hadoop jars. Even if the k8s environment is not the hadoop/yarn environment anymore, Tez heavily depends on hadoop compile time and runtime as well, and this is something we don't intend to break in the short or midterm.

  3. I'll check it. What we should really be clear about is e.g.

# 3. NodeManager Details
export NM_HOST=${NM_HOST:-"localhost"}
export NM_PORT=${NM_PORT:-"12345"}

there is no Yarn NodeManager in a k8s environment, so the reader of the entrypoint.sh should see a clear code distinguishing between needed env vars and legacy/backward-compatible env vars, that's what should be handled with care in my opinion

  1. Okay.

  2. Yeah. So given that neither tez containers (TEZ-4665) nor llap containers (HIVE-29411) thing is implemented, we cannot successfully run a whole DAG, but we can get to a point where at least a DAG is successfully submitted from Hive to this AM container. So, I believe, to make this happen, we need to make a HS2 container (see Hive instructions for dockerized setup) be able to find this Tez AM container, so most probably, we need to stop using tez.local.mode=true for this experiment
    UPDATE: after creating HIVE-29419 for a separate TezAM image in Hive, the testing of this AM could be as simple as opening a TezClient to the AM container and submitting a DAG (with documentation attached).

@Aggarwal-Raghav
Copy link
Contributor Author

Aggarwal-Raghav commented Jan 27, 2026

Thanks for the pointers @abstractdog .

  1. Yes, the implementation is reminiscent of hive (TBH, pom.xml and build-docker.sh and some parts of Dockerfile are taken from hive to some extent)
  2. For basic startup of tez am without hadoop jars, I didn't observed any issue. As tez tar ball contains few hadoop jars and i think they and their transitive dependency jars are sufficient for tez-am to be client of hadoop services (but I have commit ready just in case if we later want to remove hadoop tarball)
  3. No Update. I believe, code change in DagAppMaster is required for segregation.
  4. Raised TEZ-4685: DagAppMaster is not picking tez-site.xml from classpath #458

Few additional things:

  1. DAGAppMaster#serviceInit() => DAGAppMaster#createTaskSchedulerManager is trying to connect to ResourceManager even in zookeeper mode . I think we shouldn't use YARN scheduler and maybe move to Yunikorn (we are using that in spark internally). Let me know how to proceed for this? For now should I raise a PR for skipping it if zk mode is enabled?
2026-01-27 19:13:06,207 INFO zookeeper.ZkAMRegistry: Added AMRecord to zkpath /tez-external-sessions/tez_am/server/application_1769280834537_0000
2026-01-27 19:13:06,208 INFO app.DAGAppMaster: Added AMRecord: {hostName=2d0733bd53ae, externalId=tez-session-, hostIp=172.17.0.2, port=10001, computeName=default-compute, appId=application_1769280834537_0000} to registry..
2026-01-27 19:13:06,210 INFO rm.TaskSchedulerManager: Creating YARN TaskScheduler: org.apache.tez.dag.app.rm.DagAwareYarnTaskScheduler
2026-01-27 19:13:06,253 INFO conf.Configuration: resource-types.xml not found
2026-01-27 19:13:06,253 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2026-01-27 19:13:06,259 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2026-01-27 19:13:06,259 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2026-01-27 19:13:06,263 INFO rm.DagAwareYarnTaskScheduler: scheduler initialized with maxRMHeartbeatInterval:1000 reuseEnabled:true reuseRack:true reuseAny:false localityDelay:250 preemptPercentage:10 preemptMaxWaitTime:60000 numHeartbeatsBetweenPreemptions:3 idleContainerMinTimeout:5000 idleContainerMaxTimeout:10000 sessionMinHeldContainers:0
2026-01-27 19:13:06,267 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8030
2026-01-27 19:13:07,572 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2026-01-27 19:13:08,580 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2026-01-27 19:13:09,588 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2026-01-27 19:13:10,595 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
  1. Disable tez.am.ui as it's also using yarn rm proxy

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+0 🆗 shelldocs 0m 0s Shelldocs was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 1m 41s Maven dependency ordering for branch
+1 💚 mvninstall 7m 45s master passed
+1 💚 compile 0m 45s master passed
+1 💚 checkstyle 0m 46s master passed
+1 💚 javadoc 0m 43s master passed
-1 ❌ spotbugs 1m 5s /branch-spotbugs-tez-dag.txt tez-dag in master failed.
-1 ❌ spotbugs 0m 19s /branch-spotbugs-tez-dist.txt tez-dist in master failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 6s Maven dependency ordering for patch
+1 💚 mvninstall 0m 32s the patch passed
+1 💚 codespell 0m 27s No new issues.
+1 💚 compile 0m 27s the patch passed
+1 💚 javac 0m 27s the patch passed
-1 ❌ blanks 0m 0s /blanks-tabs.txt The patch 3 line(s) with tabs.
+1 💚 checkstyle 0m 13s the patch passed
-1 ❌ hadolint 0m 0s /results-hadolint.txt The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
-1 ❌ markdownlint 0m 2s /results-markdownlint.txt The patch generated 18 new + 0 unchanged - 0 fixed = 18 total (was 0)
-1 ❌ shellcheck 0m 0s /results-shellcheck.txt The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 💚 javadoc 0m 14s the patch passed
-1 ❌ spotbugs 0m 17s /patch-spotbugs-tez-dag.txt tez-dag in the patch failed.
-1 ❌ spotbugs 0m 9s /patch-spotbugs-tez-dist.txt tez-dist in the patch failed.
_ Other Tests _
+1 💚 unit 5m 17s tez-dag in the patch passed.
+1 💚 unit 0m 10s tez-dist in the patch passed.
-1 ❌ asflicense 0m 15s /results-asflicense.txt The patch generated 1 ASF License warnings.
22m 33s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/3/artifact/out/Dockerfile
GITHUB PR #456
Optional Tests dupname asflicense javac javadoc unit spotbugs checkstyle codespell detsecrets compile xmllint shellcheck shelldocs hadolint markdownlint
uname Linux 67dd80843a8f 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-home/workspace/tez-multibranch_PR-456/src/.yetus/personality.sh
git revision master / bfebf9e
Default Java Ubuntu-21.0.9+10-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/3/testReport/
Max. process+thread count 256 (vs. ulimit of 5500)
modules C: tez-dag tez-dist U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-456/3/console
versions git=2.43.0 maven=3.8.7 hadolint=1.18.0-0-g76eee5c codespell=2.4.1 markdownlint=0.46.0 shellcheck=0.7.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants